The impact of different training sets on medical documents classification

نویسندگان

Roberto Gatta

Mauro Vallati

Berardino De Bari

Mahmut Ozsahin

چکیده

The clinical documents stored in a textual and unstructured manner represent a precious source of information that can be gathered by exploiting Information Retrieval techniques. Classification algorithms can be used for organizing this huge amount of data, but are usually tested on standardized corpora, which significantly differ from actual clinical documents that can be found in a modern hospital. The result is that observed performance are different from expected ones. Given such differences, it is unclear how should be the “right” training set, and how its characteristics affects the classification performance. In this paper we present the results of an experimental analysis, conducted on actual clinical documents from a medical Department, which aims to evaluate the impact of differently sized and assembled training sets on well-known classification techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

متن کامل

Factors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis

Background: Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predi...

متن کامل

Face Recognition using Eigenfaces , PCA and Supprot Vector Machines

This paper is based on a combination of the principal component analysis (PCA), eigenface and support vector machines. Using N-fold method and with respect to the value of N, any person’s face images are divided into two sections. As a result, vectors of training features and test features are obtain ed. Classification precision and accuracy was examined with three different types of kernel and...

متن کامل

Fault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods

Anti-Friction Bearing (AFB) is a very important machine component and its unscheduled failure leads to cause of malfunction in wide range of rotating machinery which results in unexpected downtime and economic loss. In this paper, ensemble machine learning techniques are demonstrated for the detection of different AFB faults. Initially, statistical features were extracted from temporal vibratio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

The impact of different training sets on medical documents classification

نویسندگان

چکیده

منابع مشابه

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Factors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis

Face Recognition using Eigenfaces , PCA and Supprot Vector Machines

Fault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods

عنوان ژورنال:

اشتراک گذاری